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This  report  summarizes  recent  progress  by  Signal  Innovations  Group  (SIG)  in  supporting  the 
Naval  Research  Laboratory  (NRL)  on  the  development  and  application  of  mine  identification  algorithms 
using  a  Low  Frequency  Broadband  (LFBB)  sonar  system.  SIG  has  the  tasks  of  developing  the  algorithms 
and  transitioning  them  to  NRL  for  use  in  sea  tests.  Fully-transitioned  algorithms  have  performed  well  in 
sea  tests,  with  additional  development  increasing  the  computational  efficiency.  The  discussion  below 
provides  a  summary  of  the  following  items:  the  Kernel  Matching  Pursuits  (KMP)  classification  algorithm, 
the  correlation  kernel  method,  and  sample  results  from  an  NRL  sea  test. 

I.  Kernel  Matching  Pursuits 

A  short  historical  outline  of  the  identification  approaches  used  by  the  LFBB  program  is  given  to 
set  the  stage  for  the  present  algorithm  descriptions  and  discussion.  The  first  identification  algorithms 
were  based  on  hidden  Markov  models  (HMM).  Markov  models  are  effective  tools  for  representing 
sequential  data  and,  in  our  application,  predict  the  probability  that  sequential  target  strength  data  is  from  a 
particular  object.  The  sequential  target  strength  data  is  obtained  as  the  autonomous  underwater  vehicle 
(AUV)  transits  past  an  unknown  object.  This  approach  is  generative ,  i.e.  the  modeling  of  targets  and 
clutter  are  done  without  knowledge  of  the  other  and  identification  is  effected  by  choosing  the  model  that 
best  matches  the  unknown  data.  The  next  set  of  identification  approaches  explored  was  discriminative , 
i.e.  the  data  from  targets  and  clutter  are  simultaneously  considered  to  determine  the  best  boundary  that 
separates  targets  from  clutter.  Here  a  series  of  Bayesian  identification  algorithms  known  as  kernel 
algorithms  are  considered:  the  support  vector  machine  (SVM),  the  relevance  vector  machine  (RVM),  and 
the  kernel  matching  pursuits  (KMP).  All  of  these  kernel  identification  algorithms  have  very  similar 
performance  when  applied  to  the  LFBB  data.  The  transition  from  the  SVM  to  the  RVM/KMP  was 
primarily  motivated  by  compactness  of  the  RVM/KMP  solutions,  which  are  better  suited  for 
implementation  on  an  AUV.  The  RVM  arrives  at  a  compact  solution  by  starting  with  all  of  the  training 
data  and  then  pruning  data  to  form  a  compact  solution.  The  KMP  starts  with  none  of  the  training  data  and 
builds  up  to  a  compact  solution.  The  KMP  was  chosen  over  the  RVM  because  building  up  a  compact 
solution  is  more  computationally  efficient.  The  KMP  algorithm  was  initially  implemented  in  a  way  that 
treats  each  target  aspect  as  being  independently  drawn  from  an  identical  underlying  distribution.  Unlike 
the  HMM,  this  approach  results  in  a  trained  classifier  that  ignores  any  correlation  between  sequential 
target  responses.  This  motivated  the  design  of  a  correlation-based  KMP  kernel  that  combines  the 
advantages  of  a  discriminative  approach  and  the  sequential  nature  of  the  data.  The  structure  of  the  KMP 
algorithm  used  in  the  sea  tests  is  discussed  below  for  a  generalized  kernel  function. 


The  basic  kernel  matching  pursuits  (KMP)  algorithm  implements  a  set  of  functions  of  the  form 


n  rj~f 

fn(x)=Twn,iK(ci,x)  +  wnfi  =w,^(x)  (1) 

i= 1 

where  wn  0  is  the  bias  term,  K{  • ,  •)  is  the  kernel 

4>„  (•)  =  [1,  ATCC! ,  •),  K(  c2 ,  ^(C„ ,  -)f  (2) 

with  A(c.,  •)  the  kernel-induced  basis  function  centered  at  c. , 

W„  =K(0,wb>1,  wb>2,  •••,  w„„]r  (3) 

are  the  weights  that  combine  the  basis  functions  in  the  summation,  and  the  subscript  n  is  used  to  denote 
the  number  of  basis  functions  being  used.  This  is  the  same  form  as  found  for  the  support  vector  machine 
(SVM)  and  relevance  vector  machine  (RVM),  although  for  the  SVM  K{ c*,  x)  must  be  a  Mercer  kernel, 
while  for  the  RVM  and  KMP  this  is  not  necessary.  We  use  the  KMP  instead  of  the  RVM  or  SVM  because 
of  the  fact  that  the  KMP  has  complexity  that  is  linear  in  the  number  of  training  samples.  The  linear 
relationship  between  the  amount  of  training  data  and  the  computational  effort  is  key  in  allowing  the 
LFBB  program  to  process  the  large  amount  of  training  data  needed  for  a  blind  test  or  real  world 
operations.  A  detailed  outline  of  the  KMP  algorithm  is  given  below. 

Assume  we  are  given  a  training  set  {x., yf}f=l  of  size  A,  where  xz  is  the  z'-th  input  and  yx  its 
expected  output,  the  weighted  sum  of  squared  errors  between  the  expected  output  and  the  KMP  output  is 

=  (1 /z£i Pi )Z£i Pi [V,  -  fn (X, )]2 

=  0 IllxPi^lxPAyi  -w,^;,(x,)]2 


where  J3f  is  a  constant  responsible  for  the  importance  of  the  z-th  training  sample  (x/5  yz).  For  example, 
1/  Pi  may  represent  the  variance  of  the  zth  measurement.  In  addition,  if  one  has  a  priori  knowledge  that 
some  data  xt  are  “better”  representative  of  the  system  being  modeled,  this  can  be  accounted  for  in  the 
parameter  Pt .  The  unknowns  in  equation  (4)  are  the  centers  cz-  of  the  basis  functions  in  §n ,  and  the 

weights  are  represented  by  ww .  The  determination  of  c*  is  addressed  separately  below.  At  the  moment  we 


suppose  c.  and  consequently  are  known  and  aim  at  solving  for  wr  Then  the  value  of  that 
minimizes  equation  (4)  is  easily  found  to  be 


=M 


Def 


where  ^n  i  is  an  abbreviation  of  (xz ) ,  { • }z  =  ( * ) ,  and 

M„  =  I (x,-)^ (x,-)  =  'Tn,i)i 


(5) 


(6) 


is  the  Fisher  information  matrix. 


According  to  the  definition  in  equation  (1),  the  («+  l)-th  order  KMP  is  inductively  written  as 

//7+l(x)  =  W^+|  <)>„+,  (x) 

where 

<1 )„+!  (•)  =  [1,  Arc  Cl ,  •),  K(  c2 ,  K(cn ,  •),  K(cn+l ,  -)f 

=  "4>«0  " 

_^n+ 1  (  ')_ 


(V) 


(8) 


with  <j>n .  |  (.)  =  K( c„+i ,.)  a  new  basis  function  centered  at  c„+i.  The  weighted  sum  of  squared  errors  of  the 
(«+l)-th  order  KMP  is 

e„+i  =0 /S&AOE&Abf  -/«+10,)]2  (9) 

Assuming  the  basis  functions  in  are  all  known,  then  from  equation  (5) 

w«+l  —  {Pifyn+ljVi }/  (1®) 

minimizes  equation  (9),  where  the  Fisher  information  matrix  Mw+i  is  given  as 

~{Pi  ^w+l,z^«+l,z  }/  (H) 

One  may  show  that  ww+i  and  en+\  are  respectively  related  to  w/7  and  en  as 


w„+i  = 

w«  +  M”1  {PAn,i0n+l,i  }ib~\{Pi^Tn,i<l>n+U  )i  ™ n  ~  Vh<t>n+\,i}’i  )i  ] 

b~l  {Pi^A+\,iU^n  +  b~l  Witn+vyih 


(12A) 


en+l  en  3e(K,Cn+ 1) 


(12B) 


where 


Se(K, c„, )  =  (l/£"  p,  W'  l  {P,V,A+u >< *.  -  (AA.uJ’i },  f 


(13) 


and 


(14) 


with  =  K(cw+1,xz-) .  One  may  show  that  b  1  is  a  diagonal  element  of  M^j.  With  sufficient 

training  data  points,  we  can  always  make  Mw+1  positive  definite,  as  can  be  seen  from  equation  (6).  Then 
m;!i  is  also  positive  definite  and  it  holds  b  1  >0,  which  guarantees  5e(K,  cw+1)  is  always  greater  than 
zero.  Therefore,  from  equation  (12B),  en+]  <  en ,  which  means  appending  a  new  basis  function  to  the 
KMP  generally  leads  to  decrease  of  the  representation  error. 


As  Se(K, cn+i)  is  dependent  on  the  center  cn+\  of  the  new  basis  function,  we  obtain  different 
values  of  5e(K,cn+l)  by  selecting  different  cn+i .  If  we  confine  cw+1  to  be  selected  from  the  training 

data,  we  then  can  conduct  a  “greedy”  search  in  the  training  set  but  with  the  previously  selected  data 
excluded  to  avoid  repetition,  and  select  the  datum  that  maximizes  equation  (13).  Formally,  we  have 

C„+ 1  =  X,„+|  =  arg  maxt#.  5  e{K,  xk)  (15) 

1  <k<N 

After  cn+1  is  determined,  we  update  the  weights  using  equation  (12A)  and  the  Fisher  information  matrix 
using  equations  (11)  and  (8). 

From  equation  (13),  Se(K,  cw+1)  depends  on  the  form  of  the  kernel  K{  • ,  •)  as  well  as  on  cw+1 . 
This  allows  us  to  optimize  the  kernel  to  gain  further  error  reduction.  A  simple  approach  to  take  is  to  first 
do  a  “greedy”  search  of  cw+1  in  the  training  set,  for  a  fixed  kernel,  and  then  fix  cw+1  and  optimize  the 

parameters  of  the  kernel.  For  radial  basis  function  (RBF)  kernels,  the  only  parameter  other  than  cw+1  is 
the  kernel  width  a,  thus  optimization  of  RBF  kernels  with  cn+\  fixed  is  a  one-dimensional  search  to 
optimize  a.  It  is  also  possible  to  optimize  Cn+l  and  the  kernel  width  a  simultaneously,  but  then  cn+1  is 

treated  as  a  free  parameter  and  no  longer  confined  to  the  training  set.  Another  possibility  is  optimization 
over  kernels  of  different  functional  forms,  which  offers  greater  diversity  of  the  set  of  functions  able  to  be 
implemented  by  the  KMP. 


For  the  M-class  classification  problem,  one  builds  M  models  defined  in  equation  (1).  Suppose  the 
training  samples  are  where  xi  *s  an  observed  datum  and  y-t  e  {1,2 is  its  target  label. 

One  re-labels  the  training  data  for  each  of  the  M  models  in  the  following  way.  Let  the  labels  for  the  m-th 
model  be  denoted  as  yjm^ ,  i  =  1,2,...,  N ,  then 

,<”>=!  '■  (16) 

4  [0,  otherwise 

The  learning  is  based  on  simultaneous  minimization  of  the  empirical  risks  for  the  M  models.  Thus  the 
cost  function  of  the  KMP  in  the  classification  case  is 

Note  that  the  M  models  have  their  own  weights  but  share  the  same  basis  functions.  As  in  the  case  of  the 
basic  KMP,  we  first  solve  for  the  weights  assuming  the  basis  functions  (kernel  parameters)  are  fixed.  This 

is  done  by  taking  derivative  of  equation  (17)  with  respect  to  ,  setting  the  result  to  zero,  and  solving 
for 

w«=M;1{(| (18) 


where  M  n  is  the  same  as  in  equation  (6).  Following  the  same  methods  that  were  used  to  derive  equation 
(12),  we  obtain 


+M-1{A4„A+u}**“1[^»CA+u}fwSr,)  ~{fiA+uy\m)}i\ 


(19A) 


and 


where 


en+ 1  en  de(K,Cn+ 1) 


P,)b 


-{PA.uyT1},] 


(m)'| 


(19B) 


(20) 


with  b  the  same  as  in  equation  (14).  The  learning  of  KMP  classifiers  proceeds  in  a  similar  generative  way 
as  described  above.  At  the  n- th  iteration,  we  first  select  from  the  training  data  set  (with  the 

previously  selected  data  excluded)  that  maximizes  equation  (20),  to  locate  the  new  basis  function,  and 


then  use  equation  (19A)  to  update  the  weights.  We  can  similarly  optimize  the  kernels  in  the  KMP  M- ary 
classifiers,  using  equation  (20)  as  the  objective  function  to  optimize  kernel  parameters  or  select  different 
kernel  functional  forms. 

An  important  observation  from  the  above  procedure  is  that  we  have  not  employed  a  link  function 
to  extend  the  regression  problem  to  classification.  Link  functions  have  been  employed  in  the  RVM 
method.  We  do  not  employ  a  link  function  here  because  the  simple  and  importantly  iterative  form  of  the 
solution  in  equations  (19)  and  (20)  is  driven  by  the  linear  model  in  equation  (1).  This  linearity  is  lost  by 
introducing  a  link  function.  Note  that  for  the  M- ary  classification  problem  discussed  above  we  drive  the 
label  ym(Xi)  to  unity  if  the  xz  is  associated  with  class  m,  and  it  is  fit  to  zero  otherwise,  with  this  solved 
jointly  for  all  M  classes,  yielding  the  weights  in  equation  (19A).  A  careful  examination  of  the  RVM,  for 
example,  yields  the  recognition  that  ym(xz)  is  driven  toward  a  large  positive  number  if  X/  corresponds  to 
class  m,  and  it  is  driven  toward  a  large  negative  number  otherwise.  In  this  sense  the  approach  discussed 
above  is  closely  related  to  the  RVM,  and  the  choice  of  numbers  other  than  zero  and  one  in  equation  (16) 
simply  reflects  a  scaling  of  the  weights.  In  this  sense  the  KMP  and  RVM  treatment  of  the  M- ary 
classification  problem  are  closely  related.  The  main  distinction  is  that,  by  employing  a  link  function,  the 
RVM  yields  a  classification  as  well  as  a  probabilistic  measure  of  confidence  (in  terms  of  a  number 
between  zero  and  one)  in  that  classification. 

II.  Correlation  Kernel 

A  correlation-based  kernel  has  produced  the  most  successful  classification  results  tested  thus  far. 
Therefore,  this  method  was  implemented  in  the  sea  tests  as  described  below.  The  correlation  kernel  is 
characterized  by  the  direct  processing  of  contiguous  sets  of  angle-dependent  frequency-domain 
signatures,  without  explicit  feature  extraction.  Figure  1  illustrates  the  process.  Given  a  set  of  target 

complex  frequency  responses  { rm  }^=1  obtained  at  M  regular  aspect  intervals  around  a  target,  unwanted 

frequency  bands  are  initially  filtered  out  of  the  spectrum  (see  Figure  1(a)).  The  remaining  frequency 
bands  taken  over  a  span  of  26  degrees  of  adjacent  target  response  angles  are  now  employed  directly 
within  the  kernel.  We  define  a  two-dimensional  spectral  sequence  centered  at  aspect  n , 
sn  ~  Vn-e  ••••  rn  •  •••  rn+e  ]  >  where  sn  isa  F  x  (26  + 1)  data  matrix  (see  Figures  1(b)  and  1(c)).  We  obtain 
N  spectral  sequences  centered  at  a  uniform  increment  over  the  range  of  M  regular  aspect  intervals. 


Figure  1  —  Data  processing  for  multi-aspect  KMP  with  a  correlation  kernel.  Unwanted  frequency  bands  are  filtered  out  of  the 
sequence  of  target  aspects  in  (a),  the  aspects  are  then  divided  into  spectral  sequences  (b),  in  which  a  single  sequence 
becomes  a  feature  matrix  (c). 


For  a  set  of  spectral  sequences,  }^=1 ,  the  multi-aspect  KMP  kernel  is  defined  as 


K(  si,sj)  =  e 


_  „~d2/2a2 


(21) 


l-corr(siSs .) 

where  d  = - j - r—  is  a  measure  of  distance  between  two  spectral  sequences  based  on  a 

corr[si,sj) 

correlation  coefficient.  The  correlation  is  performed  only  along  the  aspect  dimension  (i.e.,  one  strip  is 
slid  over  the  other  along  the  aspect  axis  and  the  correlation  is  calculated  at  every  shift).  Efficient 
numerical  computation  of  the  correlation  is  achieved  through  the  use  of  fast  Fourier  transforms.  Let  St 

and  Sj  represent  two  spectral  sequences  for  which  one-dimensional  FFTs  have  been  computed  along  the 

2  Ttva  . 

aspect  direction  for  each  frequency  f  Formally,  we  have  St(f,v)  =  ^  si(f,a)e  (26>+1)  ,  and  the 
raw  correlation,  /(a) ,  is  computed  as 


r(a)  =  ZT  s,  (/,  v)  x  S‘  (/,  v)e 


^2(9+1 


27iv  a  . 
(20+1)1 


(22) 


which  results  in  a  correlation  value  at  each  shift  angle  a  .  Taking  the  maximum  shift  correlation  and 
normalizing  by  the  energy  of  the  spectral  sequences,  we  have 


max 


corr(  s,.,s.)  = 


\y(a)\ 


max| 

a 


max| 

a 


(23) 


The  above  expression,  the  absolute  correlation  coefficient  between  two  signal  sequences,  lies  between  0 
and  1  (with  0  being  no  correlation,  and  1  representing  perfect  correlation  between  two  sequences).  Given 
the  kernel  defined  in  equation  (21),  the  KMP  algorithm  obtains  a  set  of  basis  functions  and  the  associated 
weights,  where  each  basis  function  corresponds  to  a  spectral  sequence  of  contiguous  aspects,  rather  than 
single-aspect  observations  in  traditional  KMP  or  RVM. 

In  the  testing  phase,  our  objective  is  to  obtain  the  probability  or  likelihood  of  a  given  unlabeled 
sequence  of  target  responses  belonging  to  any  target  or  target  class.  In  generative  training,  each  target  is 
associated  with  a  parametric  likelihood  model,  whereas  for  the  discriminative  case,  we  obtain  a  classifier 
between  two  or  more  targets.  Given  an  unlabeled  sequence,  we  obtain  the  set  of  spectral  sequences 
similar  to  the  training  phase.  The  probability  of  the  single  sequence  belonging  to  a  “trained”  target  class 
is  given  by 

p(s test  I  target  =  j)  =  +  Ylli  W1  K(s^t , s!  I  (24) 


where  wJ  =  {wJ0....wJb  }  represents  the  weight  vector  corresponding  to  target  j,  and  {s/}^  represent  the 

corresponding  basis  functions.  Note  that  stest  and  S-  need  to  be  the  same  size  and  the  parameter  bj 
denotes  the  number  of  basis  functions  used  for  target  j. 


III.  Sea  Test  Results 


NRL  has  performed  several  sea  tests  with  successful  mine  identification  results.  However,  the 
feature  extraction  and  correlation  calculations  are  computationally  intensive.  The  data  collected  from  a 
sea  test  generates  thousands  of  test  objects,  which  makes  near-real  time  analysis  challenging.  Additional 
development  has  greatly  improved  the  computational  efficiency  of  the  algorithms.  Figure  2  illustrates  the 
reduction  in  discriminative  training  execution  time  achieved  by  optimizing  the  correlation  computations, 
without  altering  the  computed  results.  The  increase  in  processing  speed  is  dependent  upon  several 
algorithm  parameters,  such  as  the  size  of  the  frequency  bands,  F,  and  the  number  of  adjacent  angles  per 
spectral  sequence,  26  (see  Figure  1(c)).  In  general,  the  total  execution  time  for  complete  training  and 
testing  has  been  reduced  by  50-75%. 
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Figure  2  —  Execution  time  as  a  function  of  the  number  of  360°  training  data  files.  The  black  dots  represent  the  original  code, 
while  the  red  and  green  dots  represent  added  algorithmic  optimizations. 


In  May  2007,  NRL  performed  a  sea  test  off  the  coast  of  Corpus  Christi,  Texas.  The  following  sea 
test  results  were  produced  by  using  a  4-23kHz  frequency  band  and  a  20  =  90°  sequence  of  angles.  Figure 
3  displays  the  results  of  Blind  Test  1  with  the  AUV  traversing  north  and  south  lines.  Approximately  1500 
objects  (the  blue  dots)  were  classified  by  the  algorithm.  The  classification  algorithm  perfectly  identified 
the  location  of  every  cylindrical  mine,  while  the  stealthy  targets  proved  more  challenging.  Similar  results 
were  achieved  on  additional  blind  tests. 
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Figure  3  —  Mine  identification  on  the  Corpus  Christi  Blind  Test  1  north/south  lines.  The  blue  dots  represent  all  objects  tested. 

The  red  circles  indicate  the  known  types  and  locations  of  mines.  A  red  X  marks  an  object  identified  as  a  mine  based  on 
the  results  of  the  classification  algorithm. 


